Automated Suggestions for Miscollocations
نویسندگان
چکیده
One of the most common and persistent error types in second language writing is collocation errors, such as learn knowledge instead of gain or acquire knowledge, or make damage rather than cause damage. In this work-inprogress report, we propose a probabilistic model for suggesting corrections to lexical collocation errors. The probabilistic model incorporates three features: word association strength (MI), semantic similarity (via WordNet) and the notion of shared collocations (or intercollocability). The results suggest that the combination of all three features outperforms any single feature or any combination of two features. 1 Collocation in Language Learning The importance and difficulty of collocations for second language users has been widely acknowledged and various sources of the difficulty put forth (Granger 1998, Nesselhauf 2004, Howarth 1998, Liu 2002, inter alia). Liu’s study of a 4million-word learner corpus reveals that verb-noun (VN) miscollocations make up the bulk of the lexical collocation errors in learners’ essays. Our study focuses, therefore, on VN miscollocation correction. 2 Error Detection and Correction in NLP Error detection and correction have been two major issues in NLP research in the past decade. Projects involving learner corpora in analyzing and categorizing learner errors include NICT Japanese Learners of English (JLE), the Chinese Learners of English Corpus (Gamon et al., 2008) and English Taiwan Learner Corpus (or TLC) (Wible et al., 2003). Studies that focus on providing automatic correction, however, mainly deal with errors that derive from closed-class words, such as articles (Han et al., 2004) and prepositions (Chodorow et al., 2007). One goal of this work-in-progress is to address the less studied issue of open class lexical errors, specifically lexical collocation errors.
منابع مشابه
Writing assistants and automatic lexical error correction: word combinatorics
Genuine lexical writing assistants that attempt to detect lexical errors such as miscollocations are traditionally less common in Computer Assisted Language Learning than spell and grammar checkers. However, there is empirical evidence of the importance of capturing and correcting miscollocations in the writings of language learners, and therefore an increasing number of proposals deals with th...
متن کاملClassification of Grammatical Collocation Errors in the Writings of Learners of Spanish
Arbitrary recurrent word combinations (collocations) are a key in language learning. However, even advanced students have difficulties when using them. Efficient collocation aiding tools would be of great help. Still, existing “collocation checkers” still struggle to offer corrections to miscollocations. They attempt to correct without making any distinction between the different types of error...
متن کاملSurvey on Perception of People Regarding Utilization of Computer Science & Information Technology in Manipulation of Big Data, Disease Detection & Drug Discovery
this research explores the manipulation of biomedical big data and diseases detection using automated computing mechanisms. As efficient and cost effective way to discover disease and drug is important for a society so computer aided automated system is a must. This paper aims to understand the importance of computer aided automated system among the people. The analysis result from collected da...
متن کاملImproving Collocation Correction by Ranking Suggestions Using Linguistic Knowledge
The importance of collocations in the context of second language learning is generally acknowledged. Studies show that the “collocation density" in learner corpora is nearly the same as in native corpora, i.e., that use of collocations by learners is as common as it is by native speakers, while the collocation error rate in learner corpora is about ten times as high as in native reference corpo...
متن کاملProceedings of the third workshop on NLP for computer - assisted language learning
The importance of collocations in the context of second language learning is generally acknowledged. Studies show that the “collocation density" in learner corpora is nearly the same as in native corpora, i.e., that use of collocations by learners is as common as it is by native speakers, while the collocation error rate in learner corpora is about ten times as high as in native reference corpo...
متن کامل